Correction of the Caulobacter crescentus NA1000 Genome Annotation
نویسندگان
چکیده
Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.
منابع مشابه
Identification of the in Vivo Function of the High-Efficiency d-Mannonate Dehydratase in Caulobacter crescentus NA1000 from the Enolase Superfamily
The d-mannonate dehydratase (ManD) subgroup of the enolase superfamily contains members with varying catalytic activities (high-efficiency, low-efficiency, or no activity) that dehydrate d-mannonate and/or d-gluconate to 2-keto-3-deoxy-d-gluconate [Wichelecki, D. J., et al. (2014) Biochemistry 53, 2722-2731]. Despite extensive in vitro characterization, the in vivo physiological role of a ManD ...
متن کاملA comparison of the Caulobacter NA1000 and K31 genomes reveals extensive genome rearrangements and differences in metabolic potential
The genus Caulobacter is found in a variety of habitats and is known for its ability to thrive in low-nutrient conditions. K31 is a novel Caulobacter isolate that has the ability to tolerate copper and chlorophenols, and can grow at 4 ° C with a doubling time of 40 h. K31 contains a 5.5 Mb chromosome that codes for more than 5500 proteins and two large plasmids (234 and 178 kb) that code for 43...
متن کاملFood size and cGMP affects feeding behaviour in Pristionchus pacificus
Methods For this study I used P. pacificus PS312, and the mutants Ppa-egl-4, which is a null mutation in the cGMP dependent protein kinase, and Ppa-obi-1, which is an oriental beetle pheromone insensitive mutant, and the double mutant Ppa-egl-4;obi-1. I tested these strains on plates containing no food and on E.coli OP50, HB101, Caulobacter crescentus (NA1000) and Bacillus subtilis. I analyzed ...
متن کاملComplete Genome Sequence of Caulobacter crescentus Siphophage Sansa
Caulobacter crescentus is a Gram-negative dimorphic model organism used to study cell differentiation. Siphophage Sansa is a newly isolated siphophage with an icosahedral capsid that infects C. crescentus. Sansa shares no sequence similarity to other phages deposited in GenBank. Here, we describe its genome sequence and general features.
متن کاملComplete Genome Sequence of Caulobacter crescentus Siphophage Seuss
Caulobacter crescentus is a water-dwelling bacterium known to have a dimorphic life cycle. Here, we announce the complete genome of Seuss, a C. crescentus icosahedral siphophage, and describe key features. Seuss is unique among phages deposited in GenBank, with genes encoding novel hypothetical proteins composing 45% of its genome.
متن کامل